An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping
نویسندگان
چکیده
In this paper we develop an efficient optimization algorithm for solving canonical correlation analysis (CCA) with complex structured-sparsity-inducing penalties, including overlapping-group-lasso penalty and network-based fusion penalty. We apply the proposed algorithm to an important genome-wide association study problem, eQTL mapping. We show that, with the efficient optimization algorithm, one can easily incorporate rich structural information among genes into the sparse CCA framework, which improves the interpretability of the results obtained. Our optimization algorithm is based on a general excessive gap optimization framework and can scale up to millions of variables. We demonstrate the effectiveness of our algorithm on both simulated and real eQTL datasets.
منابع مشابه
Structured Sparse Canonical Correlation Analysis
In this paper, we propose to apply sparse canonical correlation analysis (sparse CCA) to an important genome-wide association study problem, eQTL mapping. Existing sparse CCA models do not incorporate structural information among variables such as pathways of genes. This work extends the sparse CCA so that it could exploit either the pre-given or unknown group structure via the structured-spars...
متن کاملStructured Input-Output Lasso, with Application to eQTL Mapping, and a Thresholding Algorithm for Fast Estimation
We consider the problem of learning a high-dimensional multi-task regression model, under sparsity constraints induced by presence of grouping structures on the input covariates and on the output predictors. This problem is primarily motivated by expression quantitative trait locus (eQTL) mapping, of which the goal is to discover genetic variations in the genome (inputs) that influence the expr...
متن کاملA Penalized Regression Model for the Joint Estimation of eQTL Associations and Gene Network Structure
Background: A critical task in the study of biological systems is understanding how gene expression is regulated within the cell. This problem is typically divided into multiple separate tasks, including performing eQTL mapping to identify SNP-gene relationships and estimating gene network structure to identify gene-gene relationships. Aim: In this work, we pursue a holistic approach to discove...
متن کاملAccounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping
MOTIVATION Expression quantitative trait loci (eQTL) studies investigate how gene expression levels are affected by DNA variants. A major challenge in inferring eQTL is that a number of factors, such as unobserved covariates, experimental artifacts and unknown environmental perturbations, may confound the observed expression levels. This may both mask real associations and lead to spurious asso...
متن کاملTree-guided Group Lasso for Multi-response Regression with Structured Sparsity, with an Application to Eqtl Mapping1 by Seyoung Kim
We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012